Frequency of Pattern Occurences in a (DNA) Sequence
نویسندگان
چکیده
Consider a given pattern H and a random text T oflength n. We assltme that consecutive symbols in the texl are generated either independently or with a Markovian dependency, i.e., we stItely both the so called Bernoulli model and the Markovian model. OUf goal is to assess the limiting distribution of the frequency of the pattern occurrences ln a random sequence. Overlapping copies of a pattern are counted separately! We prove that the number of pattern occurrences tends to a normal distribution, and we derive explicit and asymptotic formulas for the mean and the variance of the pattern occurrence. During the course of the derivation we compute the probab·llity of exactly r occurrences of H in the text T. We derive the generating function of tIllS probability, and using an analytical technique we derive in a uniform manner all results announced above. Applications of these results range from wireless communications to approximate pattern matching, molecular biology, games, codes, and stock market analysis. These findings are of particular interest to molecular biology problems such as finding patterns with unexpected (high or low) frequencies (the so called contrast words) and gene recognition.
منابع مشابه
بررسی فراوانی جهش های DNA میتوکندریایی در دیابت نوع دو
Background: Mitochondria is one of the intracellular organelle with specific DNA. Some diseases caused by mtDNA mutations have been reported up to now. Mutation of A3243G and deletion of 5kb are two of them that related to Diabetes type II. The aim of this study was to evaluate the frequency of A3243G mutation and 5kb mt DNA deletion in type II diabetic patients.Methods: The DNA extracted from...
متن کاملSimple Sequence Repeats Amplification: a Tool to Survey the Genetic Background of Olive Oils
A reliable DNA extraction method for use on extra virgin olive oil based on a commercial kit was defined, and the possibility of using this DNA for fingerprinting the original cultivar was demonstrated. The genetic traceability of single-cultivar virgin olive oil from two cultivars (Carolea and Frantoio) was achieved by identifying the varieties from which they were produced. This involved the ...
متن کاملGenotyping common SNP and a microsatellite sequence closely linked to waxy gene in rice by DNA based markers
The potential of different DNA based molecular markers was examined for the detection of single nucleotide polymorphism (SNP) in the waxy gene and a microsatellite (SSR) sequence closely linked to it in a collection of rice varieties. DNA was extracted from leaf samples of 68 different rice cultivars by the CTAB method and specific primers were designed for the amplification of waxy gene and SS...
متن کاملIntraspecific phylogeography of the Japanese threadfin bream, Nemipterus japonicus (Perciformes: Nemipteridae), from the Persian Gulf and Indo-West Pacific: a preliminary study based on mitochondrial DNA sequence
The Japanese threadfin bream, Nemipterus japonicus, the most abundant and crucially economic Nemipterus species is widespread throughout the Indo-West Pacific. The species has been studied widely for various aspects but genetic studies are scanty. This preliminary study contributes to the species phylogeography through the study of the genetic diversity and historical demography of N. japonicus...
متن کاملAn Evolutionary and Phylogenetic Study of the BMP15 Gene
DNA sequence data contains a wealth of biologically useful information. Recent innovations in DNA sequencing technology have greatly increased our capacity to determine massive amounts of nucleotide sequences. These sequences can be used to specify the characteristics of different regions, interpret the evolutionary relationships between categorized groups, likelihood of performing multiple com...
متن کامل